Acquiring Word Similarities with Higher Order Association Mining
نویسندگان
چکیده
We present a novel approach to mine word similarity in Textual Case Based Reasoning. We exploit indirect associations of words, in addition to direct ones for estimating their similarity. If word A co-occurs with word B, we say A and B share a first order association between them. If A co-occurs with B in some documents, and B with C in some others, then A and C are said to share a second order co-occurrence via B. Higher orders of co-occurrence may similarly be defined. In this paper we present algorithms for mining higher order co-occurrences. A weighted linear model is used to combine the contribution of these higher orders into a word similarity model. Our experimental results demonstrate significant improvements compared to similarity models based on first order co-occurrences alone. Our approach also outperforms state-of-the-art techniques like SVM and LSI in classification tasks of varying complexity.
منابع مشابه
Competitive Intelligence Text Mining: Words Speak
Competitive intelligence (CI) has become one of the major subjects for researchers in recent years. The present research is aimed to achieve a part of the CI by investigating the scientific articles on this field through text mining in three interrelated steps. In the first step, a total of 1143 articles released between 1987 and 2016 were selected by searching the phrase "competitive intellige...
متن کاملAn Incremental Algorithm to find Asymmetric Word Similarities for Fuzzy Text Mining
Synonymy – different words with the same meaning – is a major problem for text mining systems. We have proposed asymmetric word similarities as a possible solution to this problem, where the similarity between words is computed on the basis of the similarities between contexts in which the words appear, rather than on their syntactic identity. In this paper, we give details of an incremental al...
متن کاملAutomatic Acquisition of Context-Specific Lexical Paraphrases
Lexical paraphrasing aims at acquiring word-level paraphrases. It is critical for many Natural Language Processing (NLP) applications, such as Question Answering (QA), Information Extraction (IE), and Machine Translation (MT). Since the meaning and usage of a word can vary in distinct contexts, different paraphrases should be acquired according to the contexts. However, most of the existing res...
متن کاملPhonological templates in early words
Both formalists and functionalists have proposed that universal phonetic or phonological principles govern early word production, yet the wide range of individual differences in this period continues to resist coherent formulation in such terms, even across children acquiring a single language. This study explores the extent of withinand between-language similarities and differences in phonolog...
متن کاملSyntactic Dependencies and Distributed Word Representations for Analogy Detection and Mining
Distributed word representations capture relational similarities by means of vector arithmetics, giving high accuracies on analogy detection. We empirically investigate the use of syntactic dependencies on improving Chinese analogy detection based on distributed word representations, showing that a dependency-based embeddings does not perform better than an ngram-based embeddings, but dependenc...
متن کامل